Validate slice annotations#5582
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
Hi @lchrzaszcz. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
4c0f6a2 to
4a8f2f1
Compare
4a8f2f1 to
ace4ac6
Compare
| } else if val < 1 { | ||
| allErrs = append(allErrs, field.Invalid(annotationsPath.Key(kueuealpha.PodSetSliceSizeAnnotation), sliceSizeValue, "must be greater than or equal to 1")) | ||
| } | ||
| } |
There was a problem hiding this comment.
Can we validate that it divides into pod count with no remainder?
There was a problem hiding this comment.
I think it's a good idea, but I think it'll get more complex, as we don't have easy access to number of pods in a PodSet in that stage (see PR description). I'll give it a look thought, maybe it's easy.
There was a problem hiding this comment.
Does it make the algorithm substantially harder? If not, then I would rather support also Jobs divided with reminder.
The API-wise it makes sense to me, that you may want to have a Job of size 1000 Pods split into racks of 16 nodes. It might get awkward to require a user to pad the Job to create 1008 Pods to split into racks equally.
There was a problem hiding this comment.
I don't think it will make algorithm much more complex. If we do not mind assuming that the last slice behaves like a full slice then it should work almost out of the box (corner case: if 1000 pods fit, but 1008 do not fit, then we also won't fit such workload)
There was a problem hiding this comment.
I think it is fair for simplicity to allow it with the remark about its use in documentation, for example in the limitations.
There was a problem hiding this comment.
Cool. I'll make sure to reflect it in the main PR and write a test for that.
There is still a caveat of checking if slice size do not exceed the number of pods in PodSet. However that will be a bigger change, so we can proceed with this PR while I work on adding that validation.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lchrzaszcz, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
LGTM label has been added. DetailsGit tree hash: 17c89b17988fb33e33155fb9d8a263c4a241f0df |
|
Oh, I missed setting /ok-to-test. Approval triggered the tests, but they may potentially fail. |
|
/ok-to-test |
|
/lgtm |
|
LGTM label has been added. DetailsGit tree hash: 0503c84317a01f8ca5317cc188bd425dfa14b09b |
|
I don't think this requires a separate release note as it is part of 2-level scheduling in TAS #5353 |
What type of PR is this?
/kind feature
What this PR does / why we need it:
#5353 introduces two-level scheduling, however it does not add validation. This PR is a preparation PR that introduces validation for annotations
kueue.x-k8s.io/podset-slice-required-topologyandkueue.x-k8s.io/podset-slice-size.Which issue(s) this PR fixes:
Related to #5439
Special notes for your reviewer:
For better readability I've added tests only for Job. I see that we repeat the same tests for each (or at least most of) Job type. That would be copy-paste, so I've decided to present tests for Job for now and once I get an approval I'll add the rest of the tests or do it in next PR.
I think we should also validate that the slice size is not greater than the number of pods in PodSet, because if someone sets it to a huge number algorithm might calculate that there are 0 slices, which might have unexpected behavior. However that requires to know what is the size of a PodSet, which is not that easy on this stage of validation. I will address this in a follow-up PR.
Does this PR introduce a user-facing change?